Learning Goals:

  1. Explain the concept of causality within the potential outcomes framework
  2. Explain how randomized experiments can be used to generate evidence in support of causal claims, thereby solving the fundamental problem of causal inference


Review of Enos Trains Study:

  1. What was the hypothesis that Enos wanted to test?
  2. If Enos already knew that people living in areas with higher concentrations of immigrants were more hostile to immigration, why did he decide to run an experiment in the first place?


Establishing Causal Claims

A Working Example

Suppose you wanted to measure the effectiveness of Metababoost™, a new weight-loss drink that has become very popular recently.

You conduct a large survey tracking survey where you are able to re-interview the same people at various points over 12 months.

You are interested in how BMI changes amongst people who regularly consume Metababoost™, compared against those who do not. You try as best you can to make sure that these two groups are similar in terms of age, gender, initial BMI, etc.

After one year, you find that people who regularly consume Metababoost™ had a larger drop in BMI than people in the comparison group. The difference is 1.5 kg/m2, on average.

  1. Can you conclude that Metababoost™ causes weight loss? Why or why not?
  2. What does it mean to say that Metababoost™ causes the average person to lose 1.5 kg/m2?

Potential Outcomes

Illustration of potential outcomes for the change in BMI, depending on whether or not an individual consumes Metababoost™
BMI Change if No Metababoost™ BMI Change if Metababoost™ Difference
Alex 2 0 -2
Bonnie 1 0 -1
Colin 0 0 0
Danielle 0 -3 -3
Earl -3 -6 -3
Fiona -4 -4 0
Gaston -6 -7 -1
Hermine -8 -10 -2
AVERAGE -2.25 -3.75 -1.5

To say that Metababoost™ causes a BMI drop of 1.5 kg/m2, we mean that in an imaginary counterfactual world where the people who actually drank Metababoost™ instead did not drink it, their BMI would be 1.5 kg/m2 higher, on average.

Similarly, we could say that in a counterfactual world where the people who actually didn’t drink Metababoost™ had instead consumed it regularly, their BMI would be 1.5 kg/m2 lower, on average.

This is the idea behind causation within the potential outcomes framework.


If we could observe everyone’s potential outcome (as in the above table), then finding evidence of causation is easy!

Of course, we cannot observe these counterfactual worlds, and thus our real-world data look something like this:

Illustration of observed change in BMI for people who do and don’t drink Metababoost™
No Metababoost™ Metababoost™ Difference
Alex 2 ?
Bonnie 0 ?
Colin 0 ?
Danielle 0 ?
Earl -6 ?
Fiona -4 ?
Gaston -7 ?
Hermine -8 ?

Since we do not observe counterfatual outcomes, how can we estimate a causal effect?


Randomization and Expectation

Imagine you had a box with a large number of tickets inside. On each ticket is written a value from 0 to 50. You task is estimate the average value of the tickets in the box. You randomly choose 100 tickets from the box, and the average on these tickets is 35.

What is your best estimate for average value of the tickets in the whole box?

Returning to our working example, imagine you had a population of 1000 people. You randomly assign 500 of them to drink Metababoost™ for a year (and you make sure they actually do it). Let’s call these people the treatment group (T), and let’s call the change in BMI you measure for these people their treatment outcomes.

The other 500 randomly selected people are assigned to control group (C) and you make sure that they do not consume any Metababoost™ during the year. Their change in BMI constitute the control outcomes.

Just as you can use the value on your 100 randomly-drawn to estimate the value of all of the tickets in the box, you can think of the observed treatment outcomes as a random sample of all potential treatment outcomes. Thus, the average of these treatment outcomes forms your estimate of the average of potential treatment outcomes.

Similarly, the average of your control outcomes forms your estimate of the average of potential control outcomes.

Taking the difference between these two estimates yields your average treatment effect (ATE), or the average causal effect of Metababoost™.

NOTE: this only works because you have randomly allocated people into T and C.

Small group discussion: Think now about Enos’ experiment.

  • How did this experiment work?
  • What two groups did he compare?
  • How were people allocated to T and C?
  • In what ways does this allocation resemble (or not) random draws from a box? In other words, in what ways were these two groups different? In what ways were they the same?

Randomization and the Fundamental Problem of Causal Inference

Recall that the fundamental problem of causal inference arises because people may self-select into T or C.

To return to our working example, the people who choose to drink Metababoost™ may be different in terms of their potential outcomes from the people who choose not to drink it. For instance, suppose that people (e.g. Hermine) who cared a lot about diet and exercise also bought Metababoost™, while those (e.g. Colin) who didn’t care so much about fitness avoided the drink.

Illustration of observed change in BMI depending on whether people choose to drink Metababoost™
BMI Change if No Metababoost™ BMI Change if Metababoost™ Difference
Alex 2 ?
Bonnie 1 ?
Colin 0 ?
Danielle 0 ?
Earl -6 ?
Fiona -4 ?
Gaston -7 ?
Hermine -10 ?
AVERAGE 0.75 -6.75 -7.5

In this case, we can no longer consider the observed treatment/control outcomes as a random sample of all potential treatment/control outcomes. Comparing the above table to the full schedule of potential outcomes, we can see that people like Hermine would have lost a lot of weight anyways, even if they didn’t drink Metababoost™, while people like Colin would not have lost very much weight no matter what. But since we only observe treatment outcomes for Hermine and control outcomes for Colin, we overestimate the effect of Metababoost™.

This is the problem with allowing people to self-select in T and C. More broadly, in the absence of random assignment, estimates of the ATE may be biased – that is, if we reran this experiment a large of times, our estimates would tend to be either too large or too small.

Here is another way of thinking about this problem: suppose there is a third variable – motivation to get fit – which is correlated with both (1) change in BMI and (2) whether or not one buys Metababoost™. In other words, motivation confounds the statistical relationship between drinking Metababoost™ and change in BMI. However, since motivation is not measured, it constitutes a source of omitted variable bias. Randomization solves this problem by making sure that, on average, motivation (plus all other possible confounding variables) is equalized between T and C.


Turning away from our Metababoost™ example, what if we were instead interested in the causal effect of (contextual) diversity.

  • Returning to Putnam, if we compare people who live in ethnically-diverse areas against people who live in ethnically-homogenous areas, what types of selection biases might arise?
  • How would randomization (i.e. randomly assigning contexts to people) remove these selection biases?



Experimenting with Repeated Contact

Suppose your friend sees Enos’ results and says:

“Nice experiment, but he’s just documenting a temporary reaction to the unexpected appearance of Latinos in all-white suburbs. Over time, however, people are going to become more comfortable with diversity. For example, cities have historically been magnets for immigration, and the people living there seem to have no problem with diversity.”

Results: